Subanta pada analyzer for Sanskrit
نویسنده
چکیده
Natural language processing has wide coverage in application areas like machine translation , text to speech conversion , semantic analysis , semantic role labeling and knowledge representation .Morphological and syntactic processing are components of NLP which process each word to produce the syntactic structure of the sentence, with respect to its grammar .Semantic analysis follows syntactic analysis. One of the key task in morphological analysis is identifying the correct root word from its inflected form.In Sanskrit language, these inflected word follow the rules which are used to separate the root word from its suffix.These extracted suffix carry sufficient amount of syntactic and semantic information with them .To develop such word splitter, rules called sandhi rules given in the grammar of the Sanskrit language has been used.The challenge in the problem lies is in identifying the junction point or breaking point of the word as multiple junction can be obtained within the word .Developed system maintains database of all possible suffix , which are then used for splitting the word . The algorithm for the same is presented in the paper with the solutions to problem faced while developing the module.
منابع مشابه
Inflectional Morphology Analyzer for Sanskrit
The paper describes a Sanskrit morphological analyzer that identifies and analyzes inflected nounforms and verb-forms in any given sandhi-free text. The system which has been developed as java servlet RDBMS can be tested at http://sanskrit.jnu.ac.in (Language Processing Tools > Sanskrit Tinanta Analyzer/Subanta Analyzer) with Sanskrit data as unicode text. Subsequently, the separate systems of ...
متن کاملAnalyzing English Phrases from Pāṇinian Perspective
This paper explores Pān. inian Grammar (PG) as an information processing device in terms of ‘how’, ‘how much’ and ‘where’ languages encode information. PG is based on a morphologically rich language, Sanskrit. We apply PG on English and see how the Pān. inian perspective would deal with it from the information theoretical point of view and its effectiveness in machine translation. We analyze En...
متن کاملSemantic Processing of Compounds in Indian Languages
Compounds occur very frequently in Indian Languages. There are no strict orthographic conventions for compounds in modern Indian Languages. In this paper, Sanskrit compounding system is examined thoroughly and the insight gained from the Sanskrit grammar is applied for the analysis of compounds in Hindi and Marathi. It is interesting to note that compounding in Hindi deviates from that in Sansk...
متن کاملApplying Sanskrit Concepts for Reordering in MT
This paper presents a rule-based reordering approach for English-Hindi machine translation. We have used the concept of pada, from Pān. inian Grammar to frame the reordering rules. A pada is a word form which is ready to participate in a sentence. The rules are generic enough to apply on any English-Indian language pair. We tested the rules on English-Hindi language pair and obtained better com...
متن کاملHandling of Infinitives in English to Sanskrit Machine Translation
The development of Machine Translation (MT) system for ancient language like Sanskrit is a fascinating and challenging task. In this paper, the authors handle the infinitive type of English sentences in the English to Sanskrit machine translation (EST) system. The EST system is an integrated model of a rule-based approach of machine translation with Artificial Neural Network (ANN) model that tr...
متن کامل